Supporting Information: Biophysical informatics reveals distinctive phenotypic signatures and functional diversity of single cell lineages

Additional Results


Temporal stability of SCC morphological profiles
To evaluate whether the observed morphological differences between clonal cell lineages are attributable to temporary fluctuations or to persistent, heritable differences, we performed additional experiments in which we selected a subset of three distinct clone lineages each from the set of YM and YMR clonal lineages (6 total) and continued our analysis through multiple (3-4) passage cycles over multiple weeks. These experiments showed that the derived biophysical characteristics, both at the single cell level and the cell network level, for each of these lineages were largely conserved across generations, with relatively minor drift (Supplementary fig 6A, B,C,D). Hierarchical clustering and UMAP dimensionality reduction of the clonal lineage generations across passage cycles grouped generations of the same clonal line together, evidence that real differences exist between distinct clonal lineages that persist over many generations (Supplementary fig 6E,F,G,H).

Comparisons with prior work
A number of similar previous works have investigated collective cell behavior in the context of tumor growth and invasion using a range of computational methods. In particular, [2] and [1] examine a physical phase transition between jammed and unjammed states that occurs in densely-packed cell populations. These studies also utilize single cell measurements and cell network-like representations in order to quantify behavior.
Our informatics approach, which incorporates deep learning segmentation, shares many similarities with these previous approaches. We similarly calculated parameters describing cell shape and neighboring order-analogous to measuring vertex degree of an adjacency graph. The only difference here is that in our method, adjacency is determined by contact at the cell boundary rather than cell centroid-to-centroid distance.
In this study, we did not look closely at collective cell phase transitions. The main reason for this is that we were primarily interested in the question of cell/cell population heterogeneity, so we designed our experiments and computational analysis to perform with a very high throughput, allowing us to analyze many cell lines simultaneously. As a result, we collect data with a relatively low temporal resolution (imaging once every 12 hours), which makes it difficult to precisely identify jamming/unjamming phase transitions that often depend on high resolution dynamic behavior.
Instead, we focused on deriving a broader set of morphological, local topological, and global topological features and combining these into an extensive biophysical signature. Clustering lineages based on all metrics simultaneously, as was done here, may help to differentiate between cell populations and identify heterogeneity.
At the same time, we remain interested in extending these methods to the study of biological phase transitions. A promising future direction would be to investigate how heterogeneity, both within and between cell populations, affects jamming and unjamming transitions. Here, metrics calculated using the methods described in these previous studies could feed into the informatics analysis to construct a more detailed biophysical picture.

Single-cell Clone Generation
Individual cells were deposited into wells of 96 well plates using a FACSAria cell sorter. Cells were unstained and no selection criteria was applied, resulting in clonal populations that represent a random sampling from each of the parental populations. Clonal cells are expanded under the same conditions described above until they reach 90% confluency, at which point they are suspended using a 0.05% trypsin solution (Catalog#25200056, Gibco), and reseeded into a 6 well plate at a concentration of about 1300 cells/cm 2 . These plates are cultured in the same media formulation and under the same conditions described above. Parental lineages are named YM P or YMR P for the YUMM parental and YUMMER parental lineages respectively. SCC lineages are named with either YM or YMR designating the originating cell line and a tag ranging from pre1 to 16.6 designating the single parental cell from which the clones are generated. All clones were isolated and expanded under identical conditions. All clones and parental lines were imaged under identical conditions.

2D Image Acquisition
Imaging for all clonal populations began 24 hours after the first seeding at low confluency in a 6 well plate and typically continued at 12 hour increments until 96 hours after initial seeding, or over one four day subculture cycle. All images used for the 2D morphological analysis were taken on a Leica DMi1 phase contrast microscope at 5x magnification with a 0.12 numerical aperture.

Image segmentation model training and verification
We implement the widely used Mask R-CNN architecture to perform instance segmentation on phase contrast images through the Detectron2 python library. This modular architecture is well suited for this task; it detects and segments individual cells, allowing us to extract both cell shape characteristics and population morphologies from a single segmentation output. Training images are input as 278 2 µm 2 (256 2 px 2 ) tiles. Together, these include 4,946 single-cell annotations which were produced manually by a human annotator. We employ data augmentation, including brightness/contrast adjustment, mirroring, rotation, and cropping, applied sequentially, to artificially expand the number of distinct training examples. Training occurred for 10,000 epochs.
Quantification of large scale cell networks requires segmentation over larger areas than 278 2 µm 2 . We achieve segmentation for larger images with a sliding 256 2 px 2 window and a 128 px step size in x and y. As each region of the image is analyzed four times, the cumulative segmentation output from this process contains numerous duplicates. Pruning of these outputs is a crucial step to achieve accurate cell number and accurate segmentations. We implement a custom non-max-suppression algorithm that identifies duplicate masks based on the area overlap of the polygon masks and selects for objects with a higher classification confidence score. We additionally bias the algorithm to select for larger cells, ensuring that cells on the border of a prediction tile are not prioritized over cells in the center of a prediction tile. We note that this polygon non-max-suppression algorithm is computationally expensive compared to commonly used bounding box non-max-suppression algorithms for images with many object predictions. While this algorithm proved necessary for the images in this study, it may not be necessary for analyzing other images or cell types. Code for testing and training the segmentation model, as well as for the polygon and bounding box non-maxsuppression algorithms is available at https://github.com/trevorchan/Melanoma NetworkMorphology.

Single-cell morphological quantification
Every cell in segmentation output is characterized by four morphological variables: area, perimeter, circularity (calculated as area/perimeter), and aspect ratio. For every image, we calculate the average and the variance of these properties for all cells in a single image.

Population morphological quantification
Cell population morphology describes 2D cell organization in one field of view. Cell networks are generated from segmented outputs. Every cell in the segmentation output corresponds to a node in the graph, and edges are constructed between adjacent cells, here defined as two cells with a minimum edge-to-edge distance less than 5 microns. From the resulting network, we derive a handful of relevant topological properties, including: average vertex degree, the average number of adjacent cells for each cell; number of components; max component mass, the proportion of all nodes in the network belonging the largest component; degree variance, the variance in vertex degree for all nodes in a network; and chromatic number, a measure of colorability that is closely associated with graph degeneracy. Network generation, visualization, and topological property calculations utilize the NetworkX python package. We calculate fractal dimension from binarized images (produced from merging output masks of segmented images) using the box counting method.

Lineage characterization
As both single-cell morphology and population morphology are highly dependent on cell density, single time point snapshots are insufficient for robustly characterizing differences between clonal lineages. Images are acquired over multiple days as described above and morphological and topological quantities are derived for each. A lineage is therefore described by a collection of parameters in time and with respect to density. For this analysis, we ignore time and focus solely on cell density as the independent variable along which all morphological and topological quantities are compared. For each quantity with respect to density, we perform a polynomial regression and use the resulting fit to derive a set of general growth parameters that can be used to compare across lineages.

Unsupervised clustering and visualization
Clustering of SCC populations was accomplished using the Python SciKit-learn hierarchical clustering algorithm. Selection of the 6 morphological subclasses was performed manually by selecting an even cut across the population dendrogram at a user specified height. Resulting subclasses with a size less than 2 were excluded from further analysis, with the reason being outlier morphologies are both sufficiently rare and difficult to verify.

3D spheroid experiments
We followed a common spheroid generation protocol modeled after that described in (42). 1000 cells were deposited into each well of an agarose treated 96 well plate and centrifuged to pellet. Plates were cultured for 96 hours during which cells consolidated into spheroids and grew slightly. After 96 hours, spheroids were removed from the agarose plate and suspended in a collagen I solution with a final collagen concentration of 2 mg/ml. Collagen gels with a volume of 30 µl and containing one spheroid each were deposited onto the surface of a 24 well glass-bottomed plate. The collagen solution was allowed to polymerize at 37°C, and the plate was periodically inverted during the initial gelation phase ( 10 minutes) to ensure spheroid suspension in the gel. 1 hour after the start of polymerization, media or media and drug solution was added. Imaging of spheroids in brightfield and reflectance occurred every 24 hours starting on day 0 immediately following the addition of media and ending on day 3 (t=72 hrs), after which cells were fixed and prepared for immunofluorescence imaging.
We fixed plates with 4% paraformaldehyde and for 30 minutes at 37°C and washed plates with PBS. We then permeabilized with 0.1% Triton x-100 in PBS for 30 minutes at 37°C and washed with PBS. We blocked with 1% BSA solution in PBS for 1 hour at 20°C . We used Hoechst and phalloidin to fluorescently label nuclei and actin in 3D culture and image using a Leica SP-8 confocal microscope. All image stacks used for quantification are taken with a 10x magnification, 0.4 numerical aperture objective and a z-step size of 5 µm. Quantification of spheroid characteristics from the IF imaging data is explained in detail in the results section.

Supplementary Figures
Supplementary Figure 1. SCC lineages are characterized by 22 total variables describing single cell morphology and cell network morphology over time. (A,B) Single cell morphological variables (A) and cell-network variables (B) are shown plotted against cell density. Here YM and YMR parental and clonal lineages are plotted. Each point corresponds to a single image and contains data corresponding to the derived cell network and hundreds to thousands of cells. The collection of all data points from one clone over the course of a four day expansion period constitute a unique morphological progression. We use polynomial regression to fit curves to each clone morphological progression. (E,F) UMAP clustering of YM and YMR populations showing morphological subclasses and highlighting representative SCC lineages used for downstream 3D analysis. Density is depicted as a color gradient ranging from 0 cells/µm 2 to 5E-4 cells/µm 2 . Figure 5. 11 total derived spheroid variables describe morphology and invasion in a 3D spheroid assay. UMAP clustering of individual spheroids generates groups consistent with the 2D analysis based lineage clustering.

Supplementary
(A, B) A total of 11 variables are used to describe morphology and invasion characteristics in a 3D spheroid assay.
(C, D, E) UMAP clustering of spheroid data by lineage shows colocalization of spheroids with others from the same population (C) and from the same morphological subclass (D, E).
(F) A hierarchical clustering of all 10 SCC lineages from both cell lines considering all 2D and 3D variables largely reiterates the findings of the 2D-only clustering and the 2D+3D-clustering on YM and YMR separately. Except for YMR pre21, lineages group with others of their cell line, and when present, others of their subclass. YMR pre21 falls across the cell line boundary on the dendrogram in 2D+3D, but it did not in 2D alone (fig 3a).